Advances in stringology and applications : from combinatorics via genomic analysis to computational linguistics
نویسنده
چکیده
Written text is considered as one of the oldest methods to represent knowledge. A text can be defined as a logical and consistent sequence of symbols which encodes information in a certain language. A straightforward example are natural languages, which are typically used by humans to communicate in spoken or written form. Other underlying examples are DNA, RNA and proteins sequences; DNA and RNA are nucleic acids that carry the genetic instructions, specifies the sequence of the amino acids within proteins, regulate the development and functionality of living organisms specifies the sequence of the amino acids within proteins. Proteins are molecules consisting of one or more chains of amino acids participate in virtually every process within cells. DNA and RNA can be represented as sequences of the nucleo-bases of their nucleotides and proteins and can be represented by the sequence of amino acids encoded in the corresponding gene. A natural problem which emerges when processing such sequences is determine weather a specific patterns occur within another string (known as exact string matching problem); as far as natural language texts are concerned, an important problem in computational linguistics is finding the occurrences of a given word or sentence in a volume of text; Similarly, in computational biology identifying given features in DNA sequences is a important of great significance, on the other side, one is often interested in quantifying the likelihood that two pairs of strings have the same underlying features based on explicit similarity/dissimilarity measurement (known as approximate string matching). Both instance of the string matching problem have been studied thoroughly since early 1960s. This thesis contributes several efficient novel and derived solutions (algorithms and/or data structures), for complex problems which have been originated either out of theoretical considerations or practical problems, and study their experimental performance and compare the proposed solutions with some existing solutions. Among the latter originated introduced solution several ones motivated by realworld problems in the fields of molecular biology and computational linguistics. Despite the fact that studied problems and their proposed solutions differs in research motivation paradigm, yet still utilise similar tools and methodologies for solving the corresponding problems. For example the seminal “Aho-Corasick” Automaton is employed for finding a set of motifs in a biological sequence and detecting spelling mistakes in Arabic text. Similarly, employing the bit-masking trick to extend the DNA symbols to accelerate equivalency testing of degenerate characters in the same way to extend the Arabic alphabet to measure similarity between a stem and derived/inflected forms a given word. To: Israa, Hamza, Nasrallah and Laila.
منابع مشابه
Analysis of Chlorine Gas Incident Simulation and Dispersion Within a Complex and Populated Urban Area Via Computation Fluid Dynamics
In some instances, it is inevitable that large amounts of potentially hazardous chemicals like chlorine gas are stored and used in facilities in densely populated areas. In such cases, all safety issues must be carefully considered. To reach this goal, it is important to have accurate information concerning chlorine gas behaviors and how it is dispersed in dense urban areas. Furthermore, mainta...
متن کاملThe Order Steps of an Analytic Combinatorics
Analytic combinatorics aims to enable precise quantitative predictions of the properties of large combinatorial structures. This theory has emerged over recent decades as essential both for the analysis of algorithms and for the study of scientific models in many disciplines, including probability theory, statistical physics, computational biology and information theory. With a caref...
متن کاملA Computational Method for Solving Optimal Control Problems and Their Applications
In order to obtain a solution to an optimal control problem, a numerical technique based on state-control parameterization method is presented. This method can be facilitated by the computation of performance index and state equation via approximating the control and state variable as a function of time. Several numerical examples are presented to confirm the analytical findings and illus...
متن کاملMechanical, Rheological and Computational Study of PVP/PANI with Additives
Polyvinylpyrrolidone/polyaniline emeraldine salt (PVP/PANI) with additives (TiO2, ZnO, NaCl, and Na2SO4) was synthesized via oxidative in situ polymerization. Because of using PVP/PANI as a protective membrane layer and its applications in an electrical device, we investigated the mechanical and rheological properties of PVP/PANI and other composites in orde...
متن کاملNetwork Analysis of Interpersonal Relationships in Tehran Stock Exchange
The stock market has an important role in growth and development of countries. Network analysis is one of the latest method in analyzing the stock market. In quantitative science literature, It is a new concept for a macro view to whole market. Therefore, this research analyzes the interpersonal relationships’ network in the Tehran Stock Exchange (TSE). From the type of data collected and analy...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015